{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## PS1-4 Convexity of Generalized Linear Models"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### (a)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Proof:\n",
    "\n",
    "\\begin{align*}\n",
    "\\frac{\\partial}{\\partial \\eta} p(y; \\ \\eta) & = \\frac{\\partial}{\\partial \\eta} \\big( b(y) \\exp (\\eta y - a(\\eta)) \\big) \\\\\n",
    "                                            & = b(y) \\exp (\\eta y - a(\\eta)) (y - \\frac{\\partial}{\\partial \\eta} a(\\eta)) \\\\\n",
    "                                            & = y \\ p(y; \\ \\eta) - p(y; \\ \\eta) \\frac{\\partial}{\\partial \\eta} a(\\eta)\n",
    "\\end{align*}\n",
    "\n",
    "which indicates:\n",
    "\n",
    "$$y \\ p(y; \\ \\eta) = \\frac{\\partial}{\\partial \\eta} p(y; \\ \\eta) + p(y; \\ \\eta) \\frac{\\partial}{\\partial \\eta} a(\\eta)$$\n",
    "\n",
    "Therefore,\n",
    "\n",
    "\\begin{align*}\n",
    "\\mathbb{E} [ \\ Y; \\ \\eta \\ ] & = \\mathbb{E} [ \\ Y \\ \\vert \\ X; \\ \\eta \\ ] \\\\\n",
    "                             & = \\int y \\ p(y; \\ \\eta) \\ dy \\\\\n",
    "                             & = \\int \\frac{\\partial}{\\partial \\eta} p(y; \\ \\eta) + p(y; \\ \\eta) \\frac{\\partial}{\\partial \\eta} a(\\eta) \\ dy \\\\\n",
    "                             & = \\int \\frac{\\partial}{\\partial \\eta} p(y; \\ \\eta) \\ dy + \\int p(y; \\ \\eta) \\frac{\\partial}{\\partial \\eta} a(\\eta) \\ dy \\\\\n",
    "                             & = \\frac{\\partial}{\\partial \\eta} \\int p(y; \\ \\eta) \\ dy + \\frac{\\partial}{\\partial \\eta} a(\\eta) \\int p(y; \\ \\eta) \\ dy \\\\\n",
    "                             & = \\frac{\\partial}{\\partial \\eta} \\cdot 1 + \\frac{\\partial}{\\partial \\eta} a(\\eta) \\cdot 1 \\\\\n",
    "                             & = 0 + \\frac{\\partial}{\\partial \\eta} a(\\eta) \\\\\n",
    "                             & = \\frac{\\partial}{\\partial \\eta} a(\\eta)\n",
    "\\end{align*}\n",
    "\n",
    "i.e. The mean of an exponential family distribution is the first derivative of the log-partition function w.r.t. the natural parameter."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "##### (b)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Proof:\n",
    "\n",
    "\\begin{align*}\n",
    "\\frac{\\partial^2}{\\partial \\eta^2} p(y; \\ \\eta) & = \\frac{\\partial}{\\partial \\eta} \\big( y \\ p(y; \\ \\eta) - p(y; \\ \\eta) \\frac{\\partial}{\\partial \\eta} a(\\eta) \\big) \\\\\n",
    "                                                & = p(y; \\ \\eta) + y^2 p(y; \\ \\eta) - 2 \\ y \\ p(y; \\ \\eta) \\frac{\\partial}{\\partial \\eta} a(\\eta) + p(y; \\ \\eta) \\big( \\frac{\\partial}{\\partial \\eta} a(\\eta) \\big)^2 - p(y; \\ \\eta) \\frac{\\partial^2}{\\partial \\eta^2} a(\\eta) \\\\\n",
    "                                                & = p(y; \\ \\eta) - p(y; \\ \\eta) \\frac{\\partial^2}{\\partial \\eta^2} a(\\eta) + \\big( y - \\frac{\\partial}{\\partial \\eta} a(\\eta) \\big)^2 p(y; \\ \\eta)\n",
    "\\end{align*}\n",
    "\n",
    "which implies:\n",
    "\n",
    "$$\\big( y - \\frac{\\partial}{\\partial \\eta} a(\\eta) \\big)^2 p(y; \\ \\eta) = \\frac{\\partial^2}{\\partial \\eta^2} p(y; \\ \\eta) - p(y; \\ \\eta) + p(y; \\ \\eta) \\frac{\\partial^2}{\\partial \\eta^2} a(\\eta)$$\n",
    "\n",
    "Hence,\n",
    "\n",
    "\\begin{align*}\n",
    "\\mathrm{Var} [ \\ Y; \\ \\eta \\ ] & = \\int (y - \\mathbb{E} [ \\ Y; \\ \\eta \\ ])^2 p(y; \\ \\eta) \\ dy \\\\\n",
    "                               & = \\int \\big( y - \\frac{\\partial}{\\partial \\eta} a(\\eta) \\big)^2 p(y; \\ \\eta) \\ dy \\\\\n",
    "                               & = \\int \\frac{\\partial^2}{\\partial \\eta^2} p(y; \\ \\eta) - p(y; \\ \\eta) + p(y; \\ \\eta) \\frac{\\partial^2}{\\partial \\eta^2} a(\\eta) \\ dy \\\\\n",
    "                               & = \\int \\frac{\\partial^2}{\\partial \\eta^2} p(y; \\ \\eta) \\ dy - \\int p(y; \\ \\eta) \\ dy + \\int p(y; \\ \\eta) \\frac{\\partial^2}{\\partial \\eta^2} a(\\eta) \\ dy \\\\\n",
    "                               & = \\frac{\\partial^2}{\\partial \\eta^2} \\int p(y; \\ \\eta) \\ dy - \\int p(y; \\ \\eta) \\ dy + \\frac{\\partial^2}{\\partial \\eta^2} a(\\eta) \\int p(y; \\ \\eta) \\ dy \\\\\n",
    "                               & = 1 -1 + \\frac{\\partial^2}{\\partial \\eta^2} a(\\eta) \\\\\n",
    "                               & = \\frac{\\partial^2}{\\partial \\eta^2} a(\\eta)\n",
    "\\end{align*}\n",
    "\n",
    "i.e. The variance of an exponential family distribution is the second derivative of the log-partition function w.r.t. the natural parameter."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### (c)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Recall the NLL\n",
    "\n",
    "\\begin{align*}\n",
    "\\ell (\\theta) & = - \\log p(y^{(i)}; \\ \\eta) \\\\\n",
    "              & = - \\log b(y^{(i)}) \\exp (\\eta^T T(y^{(i)}) - a(\\eta)) \\\\\n",
    "              & = a(\\eta) - \\eta^T y^{(i)} - \\log b(y^{(i)}) \\\\\n",
    "              & = a(\\theta^T x) - x^T \\theta y - \\log b(y)\n",
    "\\end{align*}\n",
    "\n",
    "We can easily compute the gradient of $\\ell$\n",
    "\n",
    "$$\\nabla_\\theta \\ell (\\theta) = x \\frac{\\partial}{\\partial \\theta} a(\\theta^T x) - yx$$\n",
    "\n",
    "So the Hessian of $\\ell$ is\n",
    "\n",
    "$$H = \\nabla_\\theta^2 \\ell (\\theta) = x x^T \\frac{\\partial^2}{\\partial \\theta^2} a(\\theta^T x)$$\n",
    "\n",
    "For any $z \\in \\mathbb{R}^n$, we have\n",
    "\n",
    "\\begin{align*}\n",
    "z^T H z & = z^T \\big( x x^T \\frac{\\partial^2}{\\partial \\theta^2} a(\\theta^T x) \\big) z \\\\\n",
    "        & = z^T x x^T z \\frac{\\partial^2}{\\partial \\theta^2} a(\\theta^T x) \\\\\n",
    "        & = (x^T z)^2 \\frac{\\partial^2}{\\partial \\theta^2} a(\\eta) \\\\\n",
    "        & = (x^T z)^2 \\ \\mathrm{Var} [ \\ Y; \\ \\eta \\ ] \\\\\n",
    "        & \\geq 0\n",
    "\\end{align*}\n",
    "\n",
    "which means the Hessian is PSD, showing that the NLL loss of GLM is convex."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "name": "cs229",
   "language": "python",
   "display_name": "cs229"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}